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Abstract 

For a family of linear preferential attachment graphs, we provide 
rates of convergence for the total variation distance between the degree 
of a randomly chosen vertex and an appropriate power law distribution 
as the number of vertices tends to infinity. Our proof uses a new 
formulation of Stein's method for the negative binomial distribution, 
which stems from a distributional transformation that has the negative 
binomial distributions as the only fixed points. 
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1 INTRODUCTION 

Preferential attachment random graphs were introduced in [2] as a stochastic 
mechanism to explain power law degree distributions empirically observed 
in real world networks such as the world wide web. These graphs evolve by 
sequentially adding vertices and edges in a random way so that connections 
to vertices with high degree are favored. There has been much interest in 
properties of these models and their many embellishments; the draft text [19] 
is probably the best survey of this vast literature. Like the seminal work [2] 
(and the mathematically precise formulation [4]), much of this research is 
devoted to showing that if the number of vertices of the graph is large, then 
the proportion of vertices having degree k approximately decays as c 7 /c~ 7 
for some constant c 7 and 7 > 1; the so-called power law behavior. 

Our main result in this vein is Theorem 1 1 . 2 1 b elow . which, for a family of 
linear preferential attachment graphs, provides rates of convergence for the 
total variation distance between the degree of a randomly chosen vertex and 
an appropriate power law distribution as the number of vertices tends to 



infinity. The result is new and the method of proof is also of interest since it 
differs substantially from proofs of similar results (e.g. Section 8.5 of |19j). 
Our proof of Theorem 11.21 uses a new formulation of Stein's method for the 
negative binomial distribution, Theorem 11.51 below (see [T7] and references 
therein for a basic introduction to Stein's method). The result stems from 
a distributional transformation that has negative binomial distributions as 
the only fixed points (we shall shortly see the relationship between the nega- 
tive binomial distribution and power laws). Similar strategies have recently 
found success in analyzing degree distributions in preferential attachment 
models, see [T5j and Section 6 of [H]; the latter is a special case of our re- 
sults and is the template for our proofs. The remainder of the introduction 
is devoted to stating our results in greater detail. 

First we define the family of preferential attachment models we study; 
these are the same models studied in Chapter 8 of [19] , which are a general- 
ization of the models first defined in [4] , which in turn are a formalization of 
the heuristic models described in [2]. The family of models is parameterized 
by m € N and 5 > — m. For m = 1 and given 5, the model starts with 
one vertex with a single loop where one end of the loop contributes to the 
"in-degree" and the other to the "out-degree." Now, for 2 k ^ n, given 
the graph with k — 1 vertices, add vertex k along with an edge emanating 
"out" from k "in" to a random vertex chosen from the set {1, . . . , k} with 
probability proportional to the total degree of that vertex plus 5, where ini- 
tially vertex k has degree one. That is, at step k, the chance that vertex k 
connects to itself is (5 + 1)/(A;(2 + 5) — 1)). After n steps of this process, we 
denote the resulting random graph by Gn S . 

For m > 1, we define G™ ,S by first generating Gnm , and then "col- 
lapsing" consecutive vertices into groups of size m, starting from the first 
vertex, and retaining all edges. Note that with this setup, it is possible for 
a vertex to connect to itself or other vertices more than once and as many 
as m times (in fact the first vertex always consists of m loops) and all of 
these connections contribute to the in- and out-degree of a vertex (e.g. the 
first vertex has both in- and out-degree m). 

Here and below, we think of 5 and m as fixed and let W n be the in- 
degree of a randomly chosen vertex from G™ ,S . We provide a bound on the 
total variation distance between W n and a limiting distribution which is a 
mixture of negative binomial distributions. For r > and < p ^ 1, we say 
X ~ NB(r,p) if 

P(* = k) = ^^(1 " P) V, k = 0, 1, . . . 
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Definition 1.1. For m € N, 5 > —m and U uniform on (0, 1), denote the 
mixture distribution NB(m + 5, f/VP+S/m) ) by K(m, 5). 

For our main result, we define the total variation distance between two 
non-negative integer valued random variables X, Y 

<hv(&(X),Sf(Y))= sup \F(X € A) -P(Y G A) | (1.1) 

= | ^ |p(x = fc)-p(y = fc)|. (i.2) 

where here and below = {0, 1, . . .}. 

Theorem 1.2. // W n is the in-degree of a randomly chosen vertex from 
the preferential attachment graph G™ ,<5 and K(m, 5) is the mixed negative 
binomial distribution of Definition then for some constant C m ^, 

log(n) 



m,8 " 



To see the power law behavior of K(m, 5), we record the following easy 
result which is a more standard representation of K(m, 5) through its point 
probabilities. The proof follows from direct computation and then Stirling's 
formula (or Lemma 13.31 below) . These formulas with additional discussion 
are also found in Section 8.3 of [19], specifically (8.3.2), and (8.3.9-10). The 
representation of K(m, 5) as a mixture of negative binomial distributions 
does not seem to be well known. 

Lemma 1.3. Ifm€N,6> —m, and Z ~ K(m, 5), then for I = 0, 1, . . 

„ , IYZ + m + S)T (m + 2 + 5 + i-) 
P(Z = Z) = (2+ 5 ^ V 



T{m + 5)T(l + m + 3 + 5+ £) ' 

and for c„ hS = (2 + S/m)T (m + 2 + 5 + ^) /T{m + 5), 

~P(Z = k) X „ m 'f as — > oo. 

Before discussing our Stein's method result, we make a few final re- 
marks. The usual mathematical statement implying power law behavior of 
the degrees of a random graph in this setting is that the empirical degree 
distribution converges to K(m, 8) in probability (Theorem 8.2 of |19j). Such 
a result implies the total variation distance in Theorem 11.21 tends to zero (see 
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Exercise 8.14 of [IS]), but does not provide a rate. Another result similar 
to Theorem II .21 is Proposition 8.4 of [19] which states that for Z ~ K(m, 8) 

|p(W n = k) - P(Z = k)\ ^ C/n, 

which according to (|1.2p neither implies nor is implied by Theorem 11.21 Fi- 
nally, regarding other preferential attachment models, our results can likely 
be extended to some other models where the limiting distribution is K(m, S), 
for example where the update rule is that we consider here, but the starting 
graph is not. For other preferential attachment graphs where the limiting 
degree distribution is not K(m, 5) (such as those of [H]), it may be possible 
to prove analogs of Theorem 11.21 using methods similar to ours, but we do 
not pursue this here. 

To state our general result which we use to prove Theorem 11.21 we first 
define a distributional transformation. For r > and n ^ 1 let U rn be a 
random variable having the distribution of the number of white balls drawn 
in n — 1 draws in a standard Polya urn scheme starting with r "white balls" 
and 1 black ball. That is, for fixed r, we construct U r>n sequentially by 
setting U r: i = 0, and for k ^ 1, 

¥(U r>k+1 = U r>k + l\U r>k ) = 1 - ¥(U M = U r>k \U rik ) = r ^ r k k - (1-3) 

Also, for a non-negative integer valued random variable X with finite mean, 
we say X s has the size bias distribution of X if 

Definition 1.4. Let X be a non- negative integer valued random variable 
with finite mean and let X s denote a random variable having the size bias 
distribution of X. We say the random variable X* r has the r-equilibrium 
transformation if 

X* r = U r ,x s , 

where we understand U r ^x s to mean Jzf (U r) x s \X S = k) = ££{JJ rk ). 

As we shall see below in Corollary 12.31 X* r = X if and only if X ~ 
NB(r, p) for some < p < 1. Thus if some non-negative integer valued 
random variable W has approximately the same distribution as W* r , it 
is plausible that W is approximately distributed as a negative binomial 
distribution. The next result makes this heuristic precise. Here and below 
we denote the indicator of an event B by Ib or I[B]. 
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Theorem 1.5. Let W be a non-negative integer valued random variable 
with TEW = [i. Let also r > and W* r be coupled to W and have the 
r- equilibrium transformation of Definitional^ If p = r/{r + /i) and c r ^ p = 
min{(r + 2)(1 — p), 2 — p} ^ 2, then for an event B 

drv(jSf (W), NB{r,p)) < c np E pE B |W*' - W|] + 2(emax{l, r} + 1)P(S C ) 
< 2(emax{l,r} + l)P(W* r / W). 

Remark 1.6. Analogs of Theorem 11.51 for other distributions which use 
fixed points of distributional transformations are now well established in 
the Stein's method literature. For example, the book [3] develops Stein's 
method for Poisson approximation using the fact that a non-negative inte- 
ger valued random variable X with finite mean has the Poisson distribution 
if and only if X = X s — 1. Also there is the zero bias transformation for 
the normal distribution [5j, the equilibrium transformation for the exponen- 
tial distribution |13j . a less standard distribution [15], and the special case 
where r = 1 above, the discrete equilibrium transformation for the geomet- 
ric distribution [14] (see also [12] for an unrelated transformation used for 
geometric approximation) . 

Remark 1.7. The fact that negative binomial distributions are the fixed 
points of the r-equilibrium transformation is the discrete analog of the fact, 
perhaps more familiar, that a non-negative random variable X has the 
gamma distribution with shape parameter a if and only if 

V — V s 

-X- — -Dq.iA , 

where -B Qi i is a beta variable with density ax a ~ l for < x < 1 independent 
of X s ; see [E]. 

The layout of the remainder of the article is as follows. In Section [2] 
we develop Stein's method for the negative binomial distribution using the 
r-equilibrium transformation and prove Theorem 11.51 In Section [3] we use 
Theorem 11.51 to prove Theorem 1 1.21 

2 NEGATIVE BINOMIAL APPROXIMATION 

The proof of Theorem 11.51 roughly follows the usual development of Stein's 
method of distributional approximation using fixed points of distributional 
transformations (see the references of Remark 1 1 . 6|) . Specifically, if W is 
a non-negative integer valued random variable of interest and Y has the 
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negative binomial distribution, then using the definition (jl.ip we want to 
bound \F(W G A) — P(Y G A)| uniformly for ^ C Z+. Typically, this 
program has three components. 

1. Define a characterizing operator A for the negative binomial distribu- 
tion which has the property that 

TEAg(Y) = 

for all g in a large enough class of functions if and only if Jz?(Y) ~ 
NB(r,p). 

2. For A C Z+, define to solve 

^ A (jfe) = I[]fc G A] - P(y G A). (2.1) 

3. Using dH}, note that 

|P(W G A) - P(y G A)\ = \EAg A (W)\. 

Now use properties of the solutions gA and the distributional transfor- 
mation to bound the right side of this equation. 

Obviously there must be some relationship between the characterizing 
operator of Item 1 and the distributional transformation of Item 3; this is 
typically the subtle part of the program above. For Item 1, we use the 
characterizing operator for the negative binomial distribution as defined 
in [5]. 

Theorem 2.1. |5}/ If W ^ has a finite mean, then W ~ NB(r,p) if and 
only if 

E[(l -p){r + W)g{W + 1) - Wg{W)} = (2.2) 
for all bounded functions g. 

We need to develop the connection between the characterizing operator 
of Theorem 12.11 and the r-equilibrium transformation. To this end, for a 
function g define 

D^g{k) = (k/r + 1) g(k + 1) - (k/r)g(k), 

and note that the negative binomial characterizing operator of (|2.2p can be 
written 

r(l - p)D^g(W) - pWg(W). (2.3) 
The key relationship is the following. 
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Lemma 2.2. If the integer valued random variable X ^ has finite mean 
fj, > 0, X* r has the r -equilibrium distribution of X, and g is a function such 
that the expectations below are well defined, then 

^D {r) g(X* r ) = mXg(X). 

Proof. We show that 

TED^g(U r , n ) = g(n), (2.4) 
which, using the definition of the size bias distribution implies that 
fjMD^g{X^) = ^g{X s ) = MXg(X), 



as desired. To show (12. 4p . we use induction on n. The equality is obvious 
for n = 1 since U r ,\ = 0. Assume that (|2,4p holds for n and we show it holds 
for n + 1. By conditioning on the previous step in the urn process defining 
U r ^ n+ i and using (jl.3p . we find for a function / such that the expectations 
below are well defined, 

E/(C/ r , n+ i) = — j— E(C/ nn + r)/(l7 r , n + 1) + E ( 1 - Ur > n + r ) /([/ ). 
r+n yr+ny 

Combining this equality with the induction hypothesis in the form 
E(C/ r , n + r)f{U r , n + l) = r/(n) + EC/ r , re /(i7 r , n ), 

yields 

E/(tf r , n+ i) = — — /(n) + — — E/(tT r , B ). 
r+n r+n 

Now taking / = D^'g and using the induction hypothesis again yields (|2.4p . 

□ 

We now record the following result which, while not necessary for the 
proof of Theorem ll.5( underlies our whole approach for negative binomial 
approximation. 

Corollary 2.3. If the integer valued random variable X ^ is such that 
EX = r(l — p)/p for some < p < \, then X ~ NB(r,p) if and only if 

X = X* r . 
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Proof. If X = X* r then combining Theorem 12.11 and Lemma l2.2l noting the 
representation (|2.3p . we easily see that X ~ NB(r,p). 

Conversely, assume 1" ~ NB(r,p), and we show Y* r = 1" using the 
method of moments. According to (4.3) on Page 178 of [9], 

r(n — 1) ■ ■ ■ (n — k) 

E [U rtn (Ur,n - 1) ■ ■ ■ (l/r.n - fc + 1)] = ^— f 

which implies that for X with finite k + 1 moments and EX = r(l — p)/p, 

rE [(X s - 1) • • • (X s - fc)] 



E [X* r • • • (X* r -k + 1)] 



{r + k) 

p E [X(X - 1) • • • (X - k)} 



1 — p (r + k) 

Now from display (2.29) on Page 84 of [9], if Y ~ NB(r,p), then 

E[F---(y-fc + l)] =r---(r + Jfe-l) ■ 
Combining this with the calculation above, we find that for all k ^ 1, 



E[y*r . . . (y* r — k + 1)] = E[Y ■■■(Y -k + 1)]. 

Since Y has a well behaved moment generating function (i.e. exists in a 
neighborhood around zero), the moment sequence determines the distribu- 
tion and so Y = Y* r , as desired. □ 

The next two lemmas take care of Item 2 in the program outlined above, 
and obtain the properties of the solution for Item 3. We prove Theorem 11.51 
immediately after the lemmas. For a function g : Z + — > R, define Ag(k) = 
g(k + l)-g(k). 

Lemma 2.4. IfY ~ NB(r,p) and for A C Z+, g := gA satisfies the Stein 
equation 

(1 -p)(r + k)g(k + 1) - kg(k) = I[k € A] - P(Y e A), (2.5) 
then for k = 0, 1 . . ., 

|(fc + l)g(fc + l)K max{ ^ r}e and \Ag(k)\^mm{ {1 _ p f {r+k) > l}. 
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Proof. The second assertion bounding \Ag(k)\ is Theorem 2.10 applied to 
Example 2.9 of [Bj. For the first assertion, note that 

= [ P (y £ iTa)- f (^A) f( ra)i 

[p(y jfe)p(y > fc + 1) - p(y eA,Y^k + i)p(y < fc)] 

~ p(y = fe + i) ' 

so we find 

lik+mk+1)l ,m>0E^i, ( , 6) 

and the bound also holds with either term alone in the numerator. 

If r = 1 (the geometric distribution) , then we can compute (|2.6|) exactly 
as (1 — (1 — p) k+l )/p ^ 1/p, as desired. If < r < 1, then Proposition 1(b) 
of [TO] implies that P(y ^ k + 1)/P(y = fe + 1) < 1/p, which implies the 
result in this case. 

If r > 1, then we bound f)2 .6f) in three cases: k + 1 #J r(l — p)/p, 
fc + 1 < (r - 1)(1 -p)/p, and (r - 1)(1 - p)/p + 1 < k + 1 < r(l -p)/p - 1. 
For the first case, Proposition 1(b) of [10] implies that for fe+1 ^ r(l—p)/p, 

nY>k + l) < ( 1 _ (1 _ p) t±l±l\ 1 (2 7) 

p(y = fc + i) ^ v ^ + 2 y ' 1 j 

The right hand side is decreasing in k, so setting k + \ = r(l — p)/p and 
simplifying, we find that for k + 1 ^ r(l — p)/p, (|2.T|) is bounded by r/p — 
r + 1 ^ r/p, as desired. For the other two cases, we use the representation 
(see e.g. (2.27) of JT]) 

r(r + fc + l) r r-l/-, .^k. 



which yields that (|2.6p is bounded by 

{k + l)f*u r - 1 (l-u) k du 
p r (l -p) k+1 



(21 



The maximum of the integrand is achieved at p* = (r — l)/(r + — 1) and 
if A; + 1 ^ (r — 1)(1 — p)/p, then p* ^ p which implies that 



u r_1 (l -u) fc du ^ p r (l -p) k , 
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and thus that ()2.8|) is bounded by (k + 1)/(1 — p) (r — l)/p ^ r/p due to 
the restriction on the value of k. 

Finally, assume (r — 1)(1 — p)/p + 1 + 1 ^ r(l — p)/p — 1 and note 
that in order for such to exist, < p ^ 1/3 and we assume this for the 
remainder of the proof. With p* as above, 

P ^- 1 (l-n) fe dn^p P r 1 (l-^) fe , 



o 



and the lower bound on the range of k implies that p* ^ p and so we 
find (|2.8p is bounded above by 

+ ^ r n- P A k (29) 



(1 — p) k+1 p \ 1 — p 

Recalling that 1 — p ^ 1 — p* = k/{r + k — 1), it is easy to see that (|2,9p is 
increasing in k. Substituting the maximum value of k for this case, r(l — 
p)/p — 2, into p* and then this into (12. 9j) and simplifying, we find that (|2.8|) 
is bounded above by 

r / r/p- 2/(1 -p) \ r/p " r " 2 ^ r e 3_ 2/(1 _ p) ^ r 
P V r /p — 3 / ^ p ^ p 

where the first inequality follows since r/p — r — 2 ^ r/p — 3 and that for 
a,x >0 (l + §) x sCe a . 

□ 



We need the following easy corollary of Lemma 12.41 
Lemma 2.5. If for A C Z+, <? := 5a satisfies the Stein equation ()2.5[) . i/jen 

sup i^w^)! < max );' 1}e . +1 , 

fces+ r(l-p) 

sup |A( J DW 5 (fc))Kmin/l + -,4 ::Z T)- 
fcez+ I r r(l-p)J 

Proof. For the first assertion, since 5 solves the Stein equation (|2.5p . 

|r(l - p)D (r) 5 (A:)| < |p%(A;) + I[k € A] - P(Y G A)| 
^ |p%(fc)| + \I[k e A] - p(y e A) I 
^ max{r, l}e + 1, 
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where we have used Lemma l2.4i 

For the second assertion, it is easy to see that 

A(D^g(k)) = r + k r + 1 Ag(k + 1) - ^Ag(k), 

and the lemma follows after taking the absolute value, applying the triangle 
inequality, and judiciously using Lemma 12.41 □ 

Proof of Theorem \1.5l Following the usual Stein's method machinery, for 
Y ~ NB(r,p) and g := g A solving (I2.5P for i C we have 

d TY {^f{W),NB{r,p)) = sup \M[I[W G A] - P(Y G A)}\ 

ACZ+ 

= sup \E[(1- p)(r + W)g A (W + l)-Wg A (W)]\, 

AQ% + 

= p sup \n^ r) g A (W)-Wg A (W)}\. 

Lemma 12.21 implies that for g := g A , 

pE\jiD( r) g(W) - Wg{W)} = p/dE[L> (r) g{W) - D (r) g(W* r )] 
= p^[(D {r) g(W) - DWg(W**))I B ] 

+ p/dE[(L> (r) g(W) - D^g(W* r ))I B c] 
=: R\ + i?2- 

Using that [i = r(l — p)/p, we have 

p/i\D^g(W) -D^g(W* r )\ < 2r(l - p) sup \D^g(k)\, 

and so Lemma |2 . 51 implies that I-R2I ^ 2(emax{l,r} + 1)W(B C ). 
To bound we write 



W-w* r -i 

\D^g(W)-D^g(W 



I[W > W* r ] &D^g(W* r + k) 

k=0 
W* r -W-1 

- I[W* r >W] AD {r) g{W + k) 

k=0 

< sup \A(D^g(k))\\W* T - W\. 
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Combining this with the bound of Lemma 12.51 we find 

\Ri\ s$ min{(r + 2)(1 -p),2 - p}TE\W* r - W\I B , 

which, upon adding to the bound on I-R2I, yields the first bound in theorem. 
The second bound is obtained from the first by choosing B = {W = W* r }. 

□ 



3 PREFERENTIAL ATTACHMENT PROOF 

In this section we prove Theorem 11.21 following the strategy of proof of the 
main result of Section 6 of |14j . which is a special case of our results (it will 
likely help the reader to first understand the proof there). We use C m ^ to 
denote a constant only depending on m and 5 which may change from line 
to line. 

Theorem 11.21 easily follows by the triangle inequality applied to the fol- 
lowing three claims. If I is uniform on {1, ... , njindependent of W n i defined 
to be the in-degree of vertex i in G™ ,<5 , and ^ n> i := EW^,, then 

1. d TV (^(W nJ ),mrn + S, 1 ^ rs )) <C m /^, 

2. d TV (NB(m + 8, ^J+i+s ), NB(m + S, (I/n) 1 /^/™))) < C m /^, 

3. drv(NB(m + <y,(//n) 1 /(2+*/m)) >K (m,(y)) ^C m /^. 

The proofs of Items [2] and [3] are relatively straightforward, while the proof 
of Item [Tj uses the following result which we show using Stein's method (i.e. 
Theorem 11.51) . 



Theorem 3.1. Retaining the notation and definitions above, we have 
drv^iWn^NBirn + S,-^)) < C 



The layout of the remainder of this section is as follows. We first collect 
and prove some lemmas necessary for the proof of Items ([I])-© above and 
then prove these results. We prove Theorem 13.11 last, since it is relatively 
involved. 

Since Gn' S is constructed from Gnm m it will be helpful to denote 

to be the in-degree of vertex j in G l ^ £ for k ^ j — 1, where we set 1 := 

0. The first lemma is useful for computing moment information; it is a 
small variation of a special case of the remarkable results of [TTJ, see also 
Proposition 8.9 in |19j . 
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Theorem 3.2. J7T]/ If j ^ 1 and e > — 1, then the sequence of random 
variables 



r(fc + i) 



mj + 1 + £ ) 



is a martingale for k ^ j — 1, where we take Wj^^j := 0. In particular, for 



^ W kf + 1 + e = (1 + e) 



r(fc + i)r(i-i + |±| 



We also need asymptotic estimates for the ratio of gamma functions. 
The next result follows from Stirling's approximation. 

Lemma 3.3. For fixed a,b > 0, as z — > oo, 

T(z + b) 

The next lemma provides a nice asymptotic expression for expectations 
appearing in the proofs below. 



Lemma 3.4. Ifn^i and —5<m£ N, and := EW ni j, then 

• n xl/{2+5/m) 1 



(m,5) ,, „ , , 

— r + 1 - - 

m + o Vi 



m,5 



m + 5 



(m,S) r 



l/(2+<5/m) 



Cm,<5 (^) 



isl/(2+5/m) 



Proof. The second inequality follows directly from the first. For the first 

,(i.<0 ._ ewM 

k>j-l, 



assertion, Theorem 13.21 implies that for e > — 1 and a4 f ; := ^^ / fcj t '' f° r 



r(i-i + |±|)r(fc + i 



13 



The construction of G™' implies that 



/V, 



(m,<5) _ (1,8/m) 



so we find that 



(1 + 



m ' 



T(nm + 1) 



r((i-l)m + j) 



(3.1) 



Using now Lemma 13.31 for the ratios of gamma functions, we find for i > 1, 



(m,<5) 

+ 1 = - 

7TI + 7TI 



1 l + S/m 



x m 



1 ™ 3+,5/m 

((i-l)m) 2 +^ m +2^0(i 



The lead term equals {n/i) l ^ 2+ ^ (up to the error in changing i — 1 to z), and 
the second order term is easily seen to be as desired. In the case that i = 1, 
similar arguments starting from (|3.ip yield the appropriate complementary 
result. □ 

To prove Items [2] and [3] we have to bound the total variation distance 
between negative binomial distributions having different 'p' parameters. The 
next result is sufficient for our purposes. 



Lemma 3.5. If r > and ^ e < p 1, t/ien 
d TY (NB(r,p),NB(r,p-e)) ^ 



re 



P 



(3.2) 



Proof. Proposition 2.5 of [T] implies that for r > (their statement is for 
r 6 N, but the same proof works for all r > 0), 



p-e 



d TV (NB(r,p),NB(r,p - e)) = (r + 1 - 1) 
where ^ g(u) ^ 1 and 

r(l- P + g ) +L 
(p-e) 

Using these bounds on q and I in (|3.3p implies the lemma. 



q(u)du, (3-3) 



□ 
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Our final lemma is useful for handling total variation distance for con- 
ditionally defined random variables. 

Lemma 3.6. Let W and V be random variables and let X be a random 
element defined on the same probability space. Then 

drv(J2f(W),JSf(^)) < Ed TV (-5f(W|X),^f(y|X)). 

Proof. If / : R-> [0, 1], then 

Mf(W) - f(V)}\ < mif(W) - f(V)\X}\ ^ Ed TY (JZ>(W\X),J?(y\X)). 

□ 

Proof of Theorem Using (|3,2p and Lemma 13.41 we easily obtain 

ri TV (NB(m + 5, ^gg+g ), NB(m + 5, (i/n)^ 2+s ^ )) ^ ^ 
and applying Lemma 13.61 we find 

d T v(NB(m + ^^^),NB(m + ^(I/n) 1 /(^M) )) ^ ^EM, 

which is Item [2] above. Now, we couple U to / by writing U = I jn — V , 
where V is uniform on (0, 1/n) and independent of /. From here, use (|3.2p . 
Lemma 13.61 an d then the easy fact that for i ^ 1 and < a < 1, 

i a - (i - l) a s; i a ~\ 

to find 

d TV (NB(m + S, (I/n) 1 ^ 6 ^ ), K(m, 5)) 

= (f T v(NB(m + 5, (//n) 1 /( 2+<5 / m ')),NB(m + 5, u 1/(2+5/m) )) 

< Crnl A (iM 1/(2+5/m) ~ ((« " l)/n) 1 /(2+5/-) 



n ^ ( i / n )l/(2+5/m) 
i=l v ' y 

C m ,j log(n) 



which is Item [3] above. Finally, applying Lemma 13.61 to Theorem 13.11 yields 
the claim in Item [TJ above so that Theorem 11.21 is proved. □ 
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The remainder of the section is devoted to the proof of Theorem 13.11 
Since we want to apply our negative binomial approximation framework 
we must first construct a random variable having the (m + <5)-equilibrium 
distribution of W n i := W^™ . According to Definition 11.41 we first con- 
struct a variable having the size bias distribution of W n j. To facilitate this 
construction we need some auxiliary variables. 

We mostly work with G™' S through the intermediate construction of 
G™" 1 discussed in the introduction. To fix notation, if for k ^ j, W^'^™^ 
is the degree of vertex j in G l ^^ m , then we write 

m 

W ni = YW {1 > 5/w i ) 1W . (3.4) 

Further, if we let X^l be the indicator that vertex j attaches to vertex i 
in Q^' S ' m (and hence also in Q^ s / m fo r j ^ k ^ mn), then we also have 

mn 

(1,5/m) y Xi s/r ?) (3.5) 

k=m(i— 

The following well-known result allows us to use the decomposition of 
W r ^i into a sum of indicators as per (|3.4p and (|3.5p to size bias W n ,i; see e.g. 
Proposition 2.2 of [7] and the discussion thereafter. 

Proposition 3.7. Let X±, . . . , X n be zero-one random variables such that 
1P(Xj = 1) = pj. For each k = 1,. . . ,n, let (Xj)j^j- have the distribution 
of (Xj)j^fe conditional on X k = 1. If X = Y^j = iXj, P = ^[-^1> anc ^ 
zs chosen independent of the variables above with P(K = k) = pk/fJ>, then 
X s = Y2j^K + 1 has the size bias distribution of X. 

Roughly, Pr op osition 13 . 71 implies that in order to size bias W n> i, we choose 
an indicator xW™ where for I = m(i — 1) + 1, . . . , mi, k = I,... ,mn, 

¥(K = k,L = I) is proportional to P(xf/ m) = 1) (and zero for other 
values), then attach vertex K to vertex L and sample the remaining edges 
conditional on this event. Note that given (K,L) = (k,l), in the graphs 
G]' S/m , 1 j < I and k < j < nm, this conditioning does not change the 

original rule for generating the preferential attachment graph given G 1 ^™. 
The following lemma implies the remarkable fact that in order to generate 
the graphs Q 1 ^/" 1 for I ^ j < k conditional on x£ j = 1 and G\—\, we 
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attach edges following the same rule as preferential attachment, but include 
the edge from vertex k to vertex I in the degree count. 

Lemma 3.8. Retaining the notation and definitions above, for I, s ^ j < k 
we have 



l[s = l] + + S/m + 1 

(1,5/m) 



P(xjf/ m) = l|xg /m) = 1,^-1™) = — " ! ' ^~ 1,a x/ _[ ' (3.6) 



j(2 + S/m) 

where we define Wj^'j"'' = 0. 
Proof. By the definition of conditional probability, we write 

P(X% m) =l\xl s J m) =l,G)*{ m ) 
tfW = l\G) 



P(xf/ m) = l|G^ m )P(4Y m) = = 1, G^( m ) (3.7) 



P(xg /m) = l|G]f( m ) 

and we calculate the three probabilities appearing above. First note 

w (l,S/m) -. s_ 

Tr>( Y (5/m) _ 1 , r ,l,i5/mN _ yr j-l,s ~ m 

P(X. S -l^., )- (2 + 5/m)j _ 1 • 

which implies 



nx k>l =11^-1 )- (2 - 5/m 



and 



p(xg /m) = i\xW m) = i,Gy_[ m ) 

_ nwj 1 ^ + 1 + l\xfl m) = i, GfT} 

~ (2 + 5/m)k-l ' 

Using Theorem 13.21 it easy to see that 

r(fc-i + i±^)r(j) v J 

and also 
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= r(fc)r( ^^/- ) ( w(w r ) + i [s = fl + 1 + i). 
r(^-i + ^)r(j + i) v J " M 

Combining these calculations with (|3.7p and simplifying (using in particular 
that r(x + 1) = xr(x)) implies 

p^(<5/m) _ ^^-(<5/m) = 1 G 1,<5 { m ) 

j(2 + 6/m) ^W+l + i 

Considering the cases s = Z and s / Z separately yields that (|3.8p equals f|3 . 6[) . 

□ 

The previous lemma suggests the following (embellished) construction 
of (Wn,i\Xj^( = 1). Here and below we denote quantities related to this 

construction by amending (k, I). First we generate G]f( m (k, Z), a graph with 
Z — 1 vertices, according to the usual preferential attachment model. At this 
point, if I 7^ k, vertex Z and k are added to the graph, along with a vertex 
labeled i' with an edge to it emanating from vertex k. Given G^f[ m (k, I) and 

these additional vertices and edges, we generate G]' S ^ m (k,l) by connecting 
vertex Z to a vertex randomly chosen from the vertices 1, . . . , Z, i' proportional 
to their "degree weight," where vertex Z has degree weight 1 + S/m (from 
the out-edge) and i' has degree one (from the in-edge emanating from vertex 
k), and the remaining vertices have degree weight equal to their degree plus 
S/m. For Z < j < k, we generate the graphs G l ^ m {k,l) recursively from 

G^{ m (k,l) by connecting vertex j to a vertex randomly chosen from the 
vertices 1, . . . ,j, i' proportional to their degree weight, where j has degree 
weight 1+5 /m (from the out-edge). Note that none of the vertices 1, . . . , k— 1 
connect to vertex k. Also define G 1 ^ m {k, I) = G^{ m (/c,Z). If I = k, we 
attach vertex k to i' and denote the resulting graph by G 1 ^ rn {k, I). For all 

values (k, I), if j = k + 1, . . . , nm, we generate G 1 ^™^, I) from G 1 j'^{ m (k, I) 
according to usual preferential attachment among the vertices 1, . . . ,j, i' . 

We have a final bit of notation before stating relevant properties of these 
objects. Denote the degree of vertex j in this construction by wji r ^j Tn \k, I) 
and let also 



w n Ak,i) :=£<S-i )+ ;(M)- 
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Let Bkj be the event that in this construction all edges emanating from the 
vertices m{i — 1) + 1, ... , mi attach to one of the vertices 1, . . . , m(i — 1). In 
symbols, 

f x( sl m) ( k > 0=0 for all j e {m(i - 1) + 1, . . . , mi, i'}, 
Ufa i = \ ,J 

I s G {m(i - 1) + 1, ... , mi, i'}/{k} 

Finally, let W have the r-equilibrium distribution of W n %, independent of 
all else and define 

W£ = W ndK, L)Ib k , l + W'I BkL . (3.9) 

Lemma 3.9. Let I £ {m(i — 1) + 1, ... , mi}, k E {I, . . . , mn} and retain the 
notation and definitions above. 

1. JC(W n7 i(k,l) + W^lfjr\k,l)) = Jf(Wn ) i\xjfJ m ') = 1). 

2. If (K, L) is a random vector such that 

P(K = k',L = I') = k J , k'^l'E {m{i - 1) + 1, . . . ,mi}, 

then W n)l (K,L) + W^ m) (K, L) has the size bias distribution ofW n i. 

3. Conditional on the event 

{W n , l (K,L) + w£ m S / i r\K,L)=t}, 

3?(W n 4K,L)I[B K ,L}) = £>(U m+s ,tI[B K ,L}), where U r , t has the Polya 
urn distribution of Definition\l.J\ and is independent of all else. 



4- W*\ has the (m + 8) -equilibrium distribution ofW n ^. 

Proof. Items 1 and 2 follow from Proposition 13.71 and Lemma 13.81 Item 3 
follows since under the conditioning, if I[_Br-xJ = 1, then W nj i(K,L) is 
distributed as the number of white balls drawn in t — 1 draws from a Polya 
urn started with m + 5 white balls and 1 black ball (it's t — 1 draws, rather 
than t, since the initial "black ball" degree from vertex i! is included in the 
degree count W n> i{K,L) + W^, m) (i<:,L)). Item 4 follows from Items 1-3, 
using Definition 11.41 □ 
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Proof of Theorem\3Jl We apply Theorem O to Jz?(W nji ) with W*^ as de- 
fined by (|3.9p . Before constructing the coupling of J£(W n ^i) required in 
Theorem 11.51 we reduce the bound P(VF* r j ^ W n ^i). 

First note that due to the form W* r i: we have (no matter how Jt?(W nt i) 
is coupled) 

P(Wn,i + W n,i) = V(W n ,i(K, L) £ W n , u B K , L ) + P(W' + W n , u B^ L ) 

< P(W n>i (K, L) + W n ,i) + P(B c KtL ). (3.10) 

We bound the second term of (|3.10p as follows. For I G {m(i— 1)+1, • • • , mi} 
and k > mi, we directly compute 

m(i - 1)(1 + |) + j - 1 ™ m(i - 1)(1 + £) +i - 1 



j=m(i-l)+l J\*^ m> 1 j=Z m J 

m (i -!)(! + A) +i _i 



n 



7 Y9 -1- A) 

j=m(i- 1)+1 JV m; 
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1 r(m(i-l)(2 + A) +m )r(m(i-l) + l) 
(2 + A )m r(m(i-l)(2 + ^))r(mi + l) 
= l + 0(l/i), 

where in the last equality use Lemma 13.31 If k G {m(i — 1) + 1, . . . , mi}, 
then 

( m)= H ?(2 + A)_i 11 ' 

i=m(i-l)+l ■A^+rrJ 1 j=l JV Z ^m> 

which is greater than or equal to ()3. llj) (since the omitted term is a proba- 
bility), so in either case we find 

P(%) = 0(l/i). 

We have only left to bound the first term of (|3.10p . for which we must 
first define the coupling of 5£(yV n ^i) to W n> i{K,V). For each (k,l) in the 
support of (K,L), we construct 

{ (x?J m) (k, l),xfj m) ) :mn>s>je {m(i - 1) + 1, . . . , mi] , (3.12) 
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to have the distribution of the indicators of the events vertex s connects to 
vertex j in G n 'm m (k, I) and G n il m ', respectively. With this fact established, 
denote 



nm nm 

■£S m) (fc,0 = E^ m) (*,0 ^d ^/-)=^X S ^, 

s=j s=j 



which have the distribution of vertex j in the indicated graphs, and then we 
set 



W nji (k,l)= £ W£i /m) (fc,0 and w; 

j=m(i—l)+l 



/ j n 



(1,8/m) 
/ j " " nm,j 
j=m(i— 1)+1 



From this point we bound the first term of (|3.10p via 



V j=m(i-l)+l 



< E p(^ /m, (*.o^w£3 m, ) ) (3-13) 

i=m(i-l)+l 

and we show each term in the sum is 0(l/i) (still depending on m,S, but 
not on k,l), which establishes the theorem. 

The constructions for different orders of j, k, I are slightly different, so 
assume that j < I < k. Let U s j(k,l) be independent uniform (0, 1) random 
variables and for the sake of brevity, let w = 1 + S/m. First define 



X f' m \k,l)=I 



U sJ (k,l) < 



w 



and for j < s < I, given W^f^\k, I) 



X ( f' m \k,l) = l 



Us AW) < 



j(2 + 5/m)-l)_ 

W^(p\k,l) + w 
s(2 + 5/m) - 1) 



Also let X^( m ' = Xg S j (k,l) for j ^ s < I. That is, we can perfectly 
couple the degrees of vertex j in the two graphs up until vertex I arrives. 
Now, for I < s < k, given W s _f (k, I) and W^^™ 1 ' define 



X?J m \k,l)=l 



U s ,j(k,l) < 



W^l'fjk^+w 
s(2 + 5/m) 



(3.14) 
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U sJ (k,l) < 



W^'j + W 

s(2 + 6/m) - 1 



(3.15) 



Set X? ( m) {k, /) = and X s (<5 / m) as in (ETO with s = k and for s > jfe, define 



X% m) (k,l)=I 



W^ m) (kJ)+w 
s(2 + 5/m) - 1 



U s ,j(k,l) < 



W^J^- + w 
s(2 + S/m) - 1 



For j < / < k, we have jointly and recursively defined the variables 
xf^ m \k,l) and xjfj m \ and it is clear they are distributed as claimed 

above with W^ifj \k, I) and wj^fj the required degree counts. Note 

also ^ X^ m \k,l) and W (i / /m) > W^f /m) (k,l) and now define 

the event 

A sJ (k, I) = {min [j ^t^nm: xfj m) + xfj m \k % I)} = s] . 
Using that W^{ / J n) = W^f^ (k,l) under A sJ (k,l), we have 

/ nrn 

p (w£i /m) (*,0 * ^5 m) ) = p U 



■<U sJ (k,l)< s hJ 



r(l,S/m) 



s=l 



*(2 + £) 



«(2 + ^)-l 



fc-1 



s(2 + 



s(2 + 



Now using Theorem 13.21 the estimates in Lemma 13.41 and the fact that 
j, I £ {m(i — !) + !,..., mi}, we find 



[W^ m \kJ)^W^ 



if 



i 



i 



k(2 + 5) 

fc \l/(2+*/m) j ^1/(2+5/™) 1 
j) 



S (2 + A)_i s(2 + 



- 



m,<5 
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For the case I < j < k, the coupling is similar to that above, except it starts 
from (|3.14p and f)3. 15|) for j ^ s < k; the probability estimates are also 
similar. If j > k, then it is easy to see that the variables can be perfectly 
coupled. If j = k or j < I = k, then the analog of the coupling above 
can only differ if the edge emanating from vertex k connects to j in G^jf^ m , 
which occurs with chance of order 

/, n l/(2+5/m) -, 

(j) i = ^" 

Thus, for any k, I in the support of (K, L) and j € {m(i — 1) + 1, ... , mi}, 
each of the m terms in the sum ()3. 13|) is bounded above by C m g/i, which 
establishes the result. □ 
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